This project is an analysis of fast food restaurants in the United States as of May 2019. This data, collected by Datafiniti and available on Kaggle (https://www.kaggle.com/datafiniti/fast-food-restaurants), has information on 10,000 fast food restaurants in the US, including name, address, city, and more.
Using this data I hope to better understand the fast food market in America. We see fast food all around us, but quite often never think about it overall. In order to do so, I will break my analysis down into 3 question:
Library imports:
library(ggplot2)
library(readr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tm)
## Loading required package: NLP
##
## Attaching package: 'NLP'
## The following object is masked from 'package:ggplot2':
##
## annotate
library(usmap)
library(knitr)
Next I import the dataset and look into the structure by using head to examine the first few observations.
ff_data <- read_csv("Datafiniti_Fast_Food_Restaurants.csv")
## Parsed with column specification:
## cols(
## id = col_character(),
## dateAdded = col_datetime(format = ""),
## dateUpdated = col_datetime(format = ""),
## address = col_character(),
## categories = col_character(),
## city = col_character(),
## country = col_character(),
## keys = col_character(),
## latitude = col_double(),
## longitude = col_double(),
## name = col_character(),
## postalCode = col_character(),
## province = col_character(),
## sourceURLs = col_character(),
## websites = col_character()
## )
kable(head(ff_data))
After taking a first look into the data, we see that there are many features. For the purpose of this analysis, we will only need a subset of these features. Therefore to keep it more organized and central to our analysis, I will trim the dataset to a smaller number of features.
ff_data <- ff_data[,c("city", "name", "province")]
kable(head(ff_data))
city | name | province |
---|---|---|
Thibodaux | SONIC Drive In | LA |
Thibodaux | SONIC Drive In | LA |
Pigeon Forge | Taco Bell | TN |
Pigeon Forge | Arby’s | TN |
Morrow | Steak ’n Shake | GA |
Detroit | Wendy’s | MI |
Now we have simplified the data to only 3 main features: city, name, and province (state). Our data now has 10,000 rows and 3 columns.
To do a bit of data cleaning, we turn all the names into all lower case and remove any punctuation. This is to catch some of the duplicatins of names, for example Chick-fil-a vs. Chick-Fil-A. While this will catch most of the duplicates, there may still be repeats in cases where there are different versions of a name. For example, five guys and five guys burgers and fries will not be combined. However, this does not appear to have a disruptive affect on the data analysis.
ff_data$name <- tolower(ff_data$name)
ff_data$name <- removePunctuation(ff_data$name)
After taking a look into the data and organized what we will be using from it, we can now dive into our analysis.
For this analysis, we consider most popular to be those restaurants which are most abundant. Before looking at which restaurants are the most popular, let’s look at how many unique fast food restaurants this data set considers:
cat("Number of unique fast food restaurants:", length(unique(ff_data$name)))
## Number of unique fast food restaurants: 542
Next we want to see how common each of the fast food restaurants is.
ff_freqs <- ff_data %>%
group_by(name) %>%
summarise(
freq <- n())
names(ff_freqs) <- c("name", "freq")
kable(head(ff_freqs))
name | freq |
---|---|
7eleven | 19 |
90 miles cuban cafe | 1 |
abruzzi pizza | 1 |
acropolis gyro palace | 1 |
adobe cantina salsa | 1 |
ak buffet | 1 |
Looking just at this brief print out of some of the frequencies, we see that many restaurants only have 1 recorded location. For our analysis we are mainly interested in the most popular restaurants. In order to focus in on those, we can order by frequency.
ff_freqs <- ff_freqs[order(-ff_freqs$freq),]
kable(head(ff_freqs))
name | freq |
---|---|
mcdonalds | 1948 |
taco bell | 1032 |
burger king | 833 |
subway | 833 |
arbys | 666 |
wendys | 628 |
When we order by the most frequent restuarants, we see names that we are very familiar with. It is not surprising to see McDonalds at the top of the list.
Using the frequencies, we can construct a barplot of the top restaurants to visualize the data more easily.
ggplot(data=ff_freqs[1:15,], aes(x=reorder(name, -freq), y=freq)) + geom_bar(stat="identity", fill="dodgerblue") +
labs(title="Top fifteen fast food restaurants in America", x="Restaurant name", y="Number of restaurant locations") +
theme(axis.text.x = element_text(angle=90))
By observing the names of restaurants on the graph above, there don’t seem to be any surprises with the most abundant fast food restaurants. McDonalds far exceeds its competitors, with nearly 1000 more restaurants than the 2nd most abundant company. After Wendys, which has about 600 locations, we see a drop off in the number of restaurants the other main fast food players have.
Now that we know the most abundant/popular fast food chains in the US overall, I want to take a deeper dive into the state level. To do this we first need to find which fast food restaurant has the most locations in each state.
state_tops <- ff_data %>%
group_by(province) %>%
count(name) %>%
top_n(1)
## Selecting by n
state_tops <- distinct(state_tops, province, .keep_all = TRUE)
names(state_tops) <- c("state", "name", "n")
kable(head(state_tops))
state | name | n |
---|---|---|
AK | subway | 4 |
AL | taco bell | 3 |
AR | mcdonalds | 18 |
AZ | mcdonalds | 59 |
CA | mcdonalds | 158 |
CO | taco bell | 27 |
Before visualizing, let’s see how many unique restaurants are the most popular in any states:
cat("Number of Unique Most Popular Restaurants:", length(unique(state_tops$name)))
## Number of Unique Most Popular Restaurants: 6
Now let’s look at this information on a map. Using this visualization, we can see which states have which most popular fastfood restaurant.
plot_usmap(data = state_tops, values = "name") +
scale_fill_discrete(name = "name") +
theme(legend.position = "right") + labs(title = "Most popular fast food restaurant per state")
From the above map we visually see again that McDonalds is by far the most abundant/popular throughout the United States. In the middle of the country we see the most concentration of non-McDonalds states.
To begin answering our next question, I want to look at the restaurants by state.
state_ff <- ff_data %>%
group_by(province) %>%
summarise(
n()
)
names(state_ff) <- c("state", "number")
kable(head(state_ff))
state | number |
---|---|
AK | 16 |
AL | 6 |
AR | 102 |
AZ | 330 |
CA | 1201 |
CO | 148 |
As we see here, there is a vast range of number of fast food restaurants recorded for each state. Because we only have information on 10,000 restaurants total in the US (and know there are more), this information is not complete. However we will use it as a sample to understand the broader trends. One important area of missing data is within Alabama. Only 6 restaurants are reported in Alabama, which we know not to be true.
Using the above state data, we can now create a barplot to visualize which states have the most fast food restaurants.
ggplot(data=state_ff, aes(x=reorder(state, -number), y=number)) + geom_bar(stat="identity", fill="pink") +
labs(title="Number of fast food restaurants per state", x="State", y="Number of restaurants") +
theme(axis.text.x = element_text(angle=90))
Looking at the barplot above, we see that California far exceeds the number of fast food restaurants in other states, with nearly 1250. The next states in terms of most fast food restaurants are Texas, Florida, Ohio, Georgia, and Illinois.
Now let’s look at this information on a map to better visualize the geographic spread.
plot_usmap(data = state_ff, values = "number") +
scale_fill_continuous(low = "white", high = "blue", name = "Number of fast food restaurants", label = scales::comma
) + theme(legend.position = "right") + labs(title = "Number of fast food restaurants cross the US")
In this map, the darker the color the more fast food restaurants there are in that state. Instantly we see California stand out, as it is the only state in the darkest category. We then notice states like Texas, Florida, and Ohio, which were shown in the barplot as well to have high numbers of fast food restaurants.
When thinking about the fact that states like California, Texas, and Florida have the most fast food restaurants according to this data, there may be some possible confounding variables to the analysis. Question 3 will take a look at a possible reason for these states to top the list.
It is clear, both through going about our lives and through this data, that fast food is a very popular type of establishment in the United States today. We constantly pass different chains as we drive throughout cities. Through this analysis, we were able to take a deeper dive into characteristics of fast food restaurants in modern America.
Our first question was looking at what are the most popular fast food restaurants in the US. For this question, popular was synonymous with abundant. This simplification may not reflect the true popularity and sentiment of customers, though it is fair to assume that if something weren’t very popular there wouldn’t be so many. From our analysis we learned that McDonalds is by far the most popular fast food restaurant. Other top competitors included Taco Bell, Burger King, Subway, and Arbys. We also learned that when breaking down into state-popularity, only 6 different fast food chains prevailed as the most popular in a state. Once again, McDonalds took the top. We saw the highest amount of non-McDonalds popularity in the middle states of America.
Our next question looked at the locations of fast food restaurants. For this portion of the analysis we sought to discover in what US location (state) is fast food most abundant. While the data was imperfect, we did discover some key trends. California far exceeded all other states in the number of fast food restaurants they have (~1200). Other states with very large numbers of restaurants include Texas, Florida, and Ohio. The discoveries from this portion of the analysis raised concerns that led into the next question: is the number of fast food restaurants related to population? Just based on California, Texas, and Florida alone, without doing any analysis for question three, it seems like yes.
As stated above, question three investigated whether or not the number of fast food restaurants in a particular state is related to the population of that state. The results of question two began indicating that this were true. Based on the state population data and state # of fast food restaurant data from question two, we did find a stong linear relationship between the two variables. Knowing this, we were able to adjust the findings from question two to look at which states have the most fast food restaurants per capita. Frontrunners here included Wyoming, Arizona, and South Dakota. This is a much different list than before controlling for population.
Overall, this analysis did not prove any groundbreaking discoveries, but rather helps us understand a common thing in the world around us. Fast food certainly isn’t going anywhere for now, and as we’ve seen it is incredibly popular around the country. McDonalds takes the lead, but maybe some key competitors will break through in the future.